Cross Validation





Kerry Back

  • Cross-validation (CV) is a way to choose optimal hyperparameters using the training data
  • Split the training data into subsets, e.g., A, B, C, D, E
  • Define a finite set of hyperparemeter combinations (a grid) to choose from
    • Example: {“max_depth”: [3, 4], “learning_rate”: [0.05, 0.1]}
    • Example: {“hidden_layer_sizes:[[4, 2], [8, 4, 2], [16, 8, 4]]}

  • Use one of the subsets (e.g., A) as the validation set
  • Train with each of the hyperparameter combinations on the union of the remaining subsets (e.g., B \(\cup\) C \(\cup\) D \(\cup\) E)
  • Compute the trained model scores on A
  • Repeat with B as the validation set, etc.
  • For each hyperparameter combination, end up with as many validation scores as there are subsets

  • Average the validation scores to get a single score for each hyperparameter combination
  • Choose the hyperparameters with the highest average score
  • All of this together is “search over the grid using cross-validation to find the best hyperparameters”
  • It is implemented by scikit-learn’s GridSearchCV function

Example

  • Same data as in 3a-trees
    • agr, bm, idiovol, mom12m, roeq
    • data = 2021-12 (training data)
  • Quantile transform features and ret

Cross validate gradient boosting

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {
  "max_depth": [3, 4], 
  "learning_rate": [0.05, 0.1]
}

cv = GridSearchCV(
  estimator=GradientBoostingRegressor(),
  param_grid=param_grid,
)

_ = cv.fit(Xtrain, ytrain)
pd.DataFrame(cv.cv_results_).iloc[:, 4:]

param_learning_rate param_max_depth params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.05 3 {'learning_rate': 0.05, 'max_depth': 3} 0.217789 0.201953 0.126714 0.050924 0.173691 0.154214 0.060208 1
1 0.05 4 {'learning_rate': 0.05, 'max_depth': 4} 0.192301 0.203017 0.113426 0.021100 0.196805 0.145330 0.070192 2
2 0.1 3 {'learning_rate': 0.1, 'max_depth': 3} 0.176347 0.185516 0.119217 0.034083 0.152840 0.133601 0.054778 3
3 0.1 4 {'learning_rate': 0.1, 'max_depth': 4} 0.169261 0.181719 0.084938 -0.009710 0.163443 0.117930 0.072327 4